Close Shadowing Natural Versus Synthetic Speech
نویسنده
چکیده
Close shadowing experiments involving natural and synthetic stimuli are described. Preliminary results show that speakers are able to follow natural stimuli with an average delay of 70 ms whereas this delay typically exceeds 100 ms for stimuli produced by text-to-speech systems. A complementary experiment shows that this contrast is mainly due to the inappropriate or impoverished prosody generated by actual text-to-speech systems.
منابع مشابه
Close shadowing natural vs. synthetic speech
Close shadowing experiments involving natural and synthetic stimuli are here described. Preliminary results show that speakers are able to follow natural stimuli with an average delay less than 50 ms whereas this delay exceeds 100 ms for stimuli produced by Text-to-speech systems. A complementary experiment shows that this contrast is mainly due to prosody.
متن کاملShadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence
To shed light on the question whether humans converge phonetically to synthesized speech, a shadowing experiment was conducted using three different types of stimuli – natural speaker, diphone synthesis, and HMM synthesis. Three segment-level phonetic features of German that are well-known to vary across native speakers were examined. The first feature triggered convergence in roughly one third...
متن کاملThe shadow of a doubt? Evidence for perceptuo-motor linkage during auditory and audiovisual close-shadowing
One classical argument in favor of a functional role of the motor system in speech perception comes from the close-shadowing task in which a subject has to identify and to repeat as quickly as possible an auditory speech stimulus. The fact that close-shadowing can occur very rapidly and much faster than manual identification of the speech target is taken to suggest that perceptually induced spe...
متن کاملInvestigating Phonetic Convergence in a Shadowing Experiment with Synthetic Stimuli
This paper presents a shadowing experiment with synthetic stimuli, whose goal is to investigate phonetic convergence in a human-computer interaction paradigm. Comparisons to the results of a previous experiment with natural stimuli are made. The process of generating the synthetic stimuli, which are based on the natural ones, is described as well.
متن کاملSampling-Based Speech Parameter Generation Using Moment-Matching Networks
This paper presents sampling-based speech parameter generation using moment-matching networks for Deep Neural Network (DNN)-based speech synthesis. Although people never produce exactly the same speech even if we try to express the same linguistic and para-linguistic information, typical statistical speech synthesis produces completely the same speech, i.e., there is no inter-utterance variatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- I. J. Speech Technology
دوره 6 شماره
صفحات -
تاریخ انتشار 2003